Skip to content

fix(studio): filter runtime-generated nodes from resolver-shadow telemetry#1795

Merged
vanceingalls merged 1 commit into
mainfrom
sdk-resolver-shadow-runtime-filter
Jun 30, 2026
Merged

fix(studio): filter runtime-generated nodes from resolver-shadow telemetry#1795
vanceingalls merged 1 commit into
mainfrom
sdk-resolver-shadow-runtime-filter

Conversation

@vanceingalls

@vanceingalls vanceingalls commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

What

Reduce noise in the studio:sdk_resolver_shadow telemetry tripwire and add metadata to triage what remains. Telemetry-only — no disk writes, no change to user-visible edits.

Why

The resolver-shadow tripwire (PR #1547) emits element_not_found whenever the SDK session can't resolve an hf-id the server patch path targets. PostHog (SDK Resolver Shadow Parity) showed the dominant live class is runtime-generated nodes — elements a composition <script> creates at runtime (caption word/group spans, etc.). These have no static data-hf-id, so the SDK session (a static parse) cannot model them by design. They are noise, not a resolver bug, and they were drowning the genuine v0.6.110-class divergences the tripwire exists to catch.

Investigation (last 14d of events) ruled out the alternatives:

  • Content mismatch / stale session — ruled out: comp == selFile in 39/40 top rows.
  • Already-fixed class-1 (inlined sub-comp bare-leaf, fix(studio): shadow resolves bare leaf via dispatch path, not getElement #1552/v0.6.112) — the historical top-firers (hf-0ytc etc.) are all pre-fix 0.6.111 noise.
  • The biggest live firer, captions.html .captions-container, traces to packages/studio/src/captions/generator.ts building word/group nodes via createElement at runtime.

Changes

  • Runtime-node filter (sdkResolverShadowCheck / runResolverShadow): suppress element_not_found when the resolved hf-id is absent from the on-disk source. An id present in source but missing from the session stays flagged — that's the real resolver divergence. The "is it in source" test is the independent signal; "is it in the session" is circular (that's what the tripwire measures). Scoped to the DOM-edit path, where every observed runtime-node divergence lives.
  • sessionElementCount on all element_not_found / animation_not_found emits: 0 = empty/broken session (actionable), >0 = element-specific.
  • sourceHfIdCount on emitted element_not_found: =1 = a single static node the parse dropped (foreign-content exclusion / sub-comp inlining gap), >1 = duplicate ids → resolver picked the wrong instance. Forks the one remaining unexplained bucket (#soe-comment-style template ids) directly from telemetry.

Test

packages/studio/src/utils/sdkResolverShadow.test.ts — added cases: absent-from-source → suppressed; present-in-source-but-missing → still flagged; duplicate-id → sourceHfIdCount: 2. 32 pass under vitest.

Not in scope

  • Root-cause fix for the template-id bucket — needs the sourceHfIdCount split from a day of telemetry first.
  • Element-targeted paths (setTiming / removeElement / reorder / addGsapTween) have no sync source access and are low-volume; left unguarded.

🤖 Generated with Claude Code

…metry

The sdk_resolver_shadow tripwire flagged element_not_found for nodes a
composition <script> creates at runtime (caption word/group spans, etc.).
These have no static data-hf-id, so the SDK session (a static parse) cannot
model them by design; the divergence is noise, not a resolver bug.

- Runtime-node filter: suppress element_not_found when the resolved hf-id is
  absent from the on-disk source. An id PRESENT in source but missing from the
  session stays flagged (the genuine v0.6.110-class resolver divergence).
- Add sessionElementCount to all element_not_found / animation_not_found emits
  (0 = empty/broken session, >0 = element-specific).
- Add sourceHfIdCount to emitted element_not_found: =1 = static node the parse
  dropped (foreign-content exclusion / sub-comp gap), >1 = duplicate-id
  resolver ambiguity.

Scoped to the DOM-edit path. Telemetry-only; no disk writes, no edit change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@miga-heygen miga-heygen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid, well-scoped PR. The runtime-node filter logic is sound, the telemetry metadata enrichments are useful for triaging the remaining buckets, and the test coverage hits the important cases. CI is green on all required checks. A few observations:

1. includes vs countHfIdInSource semantic gap (minor — metadata inconsistency, not a bug)

The filter gate at L180 uses sourceContent.includes(hfId) (raw substring), while countHfIdInSource uses the attribute-scoped data-hf-id="${id}". For an id like hf-a where source only contains data-hf-id="hf-abc", includes("hf-a") returns true (correctly keeps the signal — no false suppression), but countHfIdInSource("...", "hf-a") returns 0. The emitted event would carry sourceHfIdCount: 0 for an element_not_found that survived the filter — a "present in source but zero attribute matches" state that could confuse triage dashboards. The bias direction is correct (overshoot toward keeping signal), but making the filter gate use countHfIdInSource(sourceContent, hfId) > 0 instead of includes would keep both checks on the same semantic level and avoid the zero-count anomaly. Not blocking — the noise budget on this edge is tiny (hf-ids with shared prefixes are uncommon) and the "ponytail" comment documents the intent.

2. sessionElementCount double-call in runResolverShadow

sdkResolverShadowCheck already calls resolveSnapshot → session.getElements() internally, and then runResolverShadow calls session.getElements().length again for the telemetry payload. Since getElements() returns a fresh snapshot array each time, this is a second full traversal. For the telemetry-only hot path on every style/text/attr edit, the double call is worth noting. Not blocking since the check only reaches telemetry on divergence (early return on mismatches.length === 0), so the second call fires only on the already-rare emit path.

3. Test coverage

The three new C8 cases cover the filter logic well (suppressed / kept / count). One thing that would strengthen the suite: a test for the includes vs countHfIdInSource gap itself — e.g. id "hf-a" against source data-hf-id="hf-abc" — to document the expected behavior when the loose match hits but the tight count doesn't. Not blocking; flagging it as a future hardening opportunity.

4. Everything else looks clean

  • countHfIdInSource split technique is correct (no regex escaping needed for hf-ids).
  • sourceContent?: string optional parameter is backward compatible.
  • Existing redaction untouched — no PII risk from the new count fields.
  • The intentional exclusion of recordResolverParity / recordAnimationResolverParity from the runtime-node filter is well-documented in the PR body.
  • recordAnimationResolverParity caches session.getElements() to avoid the double-call — nice touch.

Overall: clean telemetry-only change with correct bias direction. LGTM with the minor notes above.

— Miga

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10743d85.

Verdict: APPROVE.

The runtime-node filter is directionally correct and scoped to resolver-shadow telemetry only: when Studio resolves a live DOM hf-id that is absent from the on-disk source, the event is suppressed because the SDK static session cannot model script-created runtime nodes by design. Static ids that are present in source but missing from the SDK session still emit, preserving the actionable v0.6.110-class resolver-divergence signal.

Verified:

  • useDomEditSession now passes originalContent only into the shadow check path; edit persistence behavior is unchanged.
  • sdkResolverShadowCheck suppresses absent-from-source element_not_found cases and keeps present-in-source missing-session cases.
  • sourceHfIdCount is telemetry metadata only; source content itself is not emitted.
  • sessionElementCount is added to element/animation not-found telemetry, which should make empty-session vs element-specific buckets easier to triage.
  • Tests cover the important suppression/keep/count cases.

Non-blocking: I agree with Miga that the sourceContent.includes(hfId) gate and countHfIdInSource() use slightly different semantics. A shared countHfIdInSource(sourceContent, hfId) > 0 gate would avoid rare substring-prefix anomalies (hf-a vs hf-abc) where an event survives with sourceHfIdCount: 0. The current bias is safe because it keeps extra signal instead of suppressing a real bug.

CI has no failures at review time; Windows render/test jobs are still pending.
— Magi

@vanceingalls vanceingalls merged commit a4eacae into main Jun 30, 2026
54 of 60 checks passed
@vanceingalls vanceingalls deleted the sdk-resolver-shadow-runtime-filter branch June 30, 2026 07:16

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10743d85 — layering on Miga's COMMENTED and Magi's APPROVED reviews from this morning. ✅ LGTM (telemetry-only, scoped, correct bias direction).

Summary

Adds a runtime-node filter (sourceContent.includes(hfId) gate) to suppress noisy element_not_found for script-created nodes, plus sessionElementCount on all emit sites and sourceHfIdCount on the DOM-edit path's element_not_found events. Filter live only on runResolverShadow (DOM-edit path) — the element-targeted cutover chokepoints (recordResolverParity, recordAnimationResolverParity) stay unguarded, which the PR body documents as deliberate (no sync source access; low-volume).

Findings

🟡 Dashboard-consumer impact of the includes vs countHfIdInSource gap (extending Miga's finding 1)

Miga's note about the semantic gap is mechanically correct (verified — "data-hf-id=\"hf-abc\"".includes("hf-a") is true; countHfIdInSource(src, "hf-a") returns 0). The downstream consequence is worth surfacing explicitly: the PR body and the comment at sdkResolverShadow.ts:277-281 document sourceHfIdCount as a {=1: single-static-parse-drop, >1: duplicate-id} two-bucket signal, but the gap introduces a third bucket — =0 — for ids that pass the loose includes gate as a longer-id prefix. Anyone querying PostHog with sourceHfIdCount == 1 to size the "parser dropped a static node" cohort will miss those events, and sourceHfIdCount > 0 won't catch them either.

Two cheap fixes, take either:

  1. Tighten the gate to countHfIdInSource(sourceContent, hfId) > 0 (Miga's suggestion) — keeps the two-bucket invariant by construction.
  2. Keep the loose gate, but document =0 as the third explicit bucket ("loose-prefix match — same disposition as runtime-only; suppress at dashboard layer") and add a regression test pinning the behavior. The "ponytail" comment already nods at this; just make the dashboard-side intent explicit.

I'd take (1) — the cost is one extra split-pair on the hot-path early return, and it preserves the documented mental model. But not blocking.

🟡 Test for the includes/count boundary (seconding Miga's #3)

A test like the one Miga sketched — id "hf-a" against source data-hf-id="hf-abc" — would pin the current behavior either way (whether you take fix #1 or fix #2 above). The C8 cases as-written cover the canonical paths (runtime-only / static / dup) but don't cover the prefix-overlap corner that's now load-bearing for the filter's correctness.

🟡 runResolverShadow failure-path coverage (per the observability rubric)

The new metadata is only emitted on the element_not_found mismatch branch (correct). But the existing dispatch_error branch at sdkResolverShadow.ts:214 returns a mismatch without the runtime-node filter being consulted (the filter only sits in the unresolved-snapshot early return at L177-187), so a dispatch_error on an actually-runtime-only hfId would still emit. Not a new gap — same behavior as before this PR — but the PR's stated goal is "reduce noise on the DOM-edit path." If captions.html-class runtime ids ever reach the dispatch path (i.e. resolveSnapshot succeeds via the bare-match fallback at L86-89), they'd skip the new filter. Worth a follow-up grep to confirm the bare-match fallback never resolves a runtime-only id to a real session element. If it does, that's a different noise bucket the filter doesn't catch.

✅ Verified clean

  • originalContent at useDomEditCommits.ts:153-157 is the full file body (read via /api/projects/.../files/<targetPath>), so the includes substring search has the right surface — it's not scoped to a selection subtree.
  • trackStudioEvent (studioTelemetry.ts:69-77) is synchronous queue-push; no import().then async-import race for early-call telemetry loss (one of the failure-modes I always check on telemetry PRs — clean here).
  • PostHog payload values are bounded: sessionElementCount is an int, sourceHfIdCount is an int, no user content. No cardinality / PII concern from the new fields. Existing redactMismatches redaction of expected/actual is untouched.
  • recordAnimationResolverParity cache of session.getElements() (L344) is a nice incidental cleanup; matches Miga's note.
  • The intentional exclusion of recordResolverParity / recordAnimationResolverParity from the source-filter is documented in the PR body ("low-volume; left unguarded") and the source-access constraint is real — sdkTimingPersist / sdkGsapTweenPersist don't read the file.

✅ Sibling-asymmetry check (clean)

The three trackStudioEvent("sdk_resolver_shadow", ...) emit sites at L272 / L312 / L347 are intentionally asymmetric on sourceHfIdCount (DOM-edit only) and intentionally symmetric on sessionElementCount (all three). Matches the PR-body intent. The redaction shape is identical across all three.

Verdict

LGTM. The includes vs count gap (Miga's #1) is the most actionable polish; my dashboard-bucket reframe makes the "fix vs document" tradeoff explicit. None of it blocks merge — the bias direction is correct (keep signal when in doubt), and the documented goal of cutting captions.html-class noise is preserved.

— Rames D Jusso

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants